Understanding and Improving Human Data Relations

Alex Bowyer

Additional Reference Information

ARI2 Additional Reference Information for Chapter 2

ARI2.1 Data Protection Terminology and a Legal Definition of Personal Data

From the GDPR (Hoofnagle, Sloot and Borgesius, 2019) and its antecedents, a number of concepts have been established which are relevant to this thesis, specifically (Information Commissioner’s Office, 2014; The European Parliament and the Council of the European Union, 2016):

The terms Subject Access Request and Data portability are used in Case Study Two, and referenced also in Chapter 7.

For simplicity, this thesis uses everyday layperson-friendly terms rather than the legal terms defined in this section. Data subjects are referred to simply as individuals and both data controllers and data processors as data holders, because for this thesis, focusing as it does on the individual perspective, there is no need to draw a distinction between data controllers and data processors.

ARI3 Additional Reference Information for Chapter 3

ARI3.1 The Private Data Viewing Monitor

By removing the filter layer on an old monitor and modifying cinema IMAX glasses, a monitor was created that only allowed viewing by the holder of the viewing glasses, which would be ideal for interviewing someone about their data while respecting privacy. Face to face interviewing had to be abandoned due to COVID-19, so this technique was sadly never used in practice.

Figure ARI3.1: Private Data Viewing Monitor with Viewing Glasses

ARI4. Additional Reference Information for Chapter 4

ARI4.1 Family Civic Data Categories

The table below illustrates the types of family civic data identified in the pilot study [3.4.1; Bowyer et al. (2018); Appendix A], and referenced in Case Study One [4.2.1].

Table ARI4.1 - Example Categories of Family Civic Data.
Category Type of data Examples/Details
Family Personal details Date of birth, address, telephone number.
Relationships Marital status, exs, step-parents, living arrangements.
Children Parentage, adoption, fostering, childcare.
Education School Records Attendance (truancy), special needs.
Academic Results SATs, reports, exam failures, training courses.
Welfare Social Support Social worker visits & notes, details of family crises, interventions, allegations.
Welfare Benefits Jobseeker’s Allowance, child support, Disability Living Allowance, tax credits
Money/Work Family Finances Salary, savings, credit cards, spending, debt
Employment Job history, periods of unemployment, performance at work, NI, PAYE, pensions.
Civil Housing data Council house provision, eligibility criteria.
Legal documents Birth / marriage / death certificates, citizenship /immigration status, work permits.
Crime Criminal records Arrests, cautions, offenders’ registers, prison time, speeding tickets, spent convictions.
Court orders Restraining orders, lawsuits, custody, ASBOs.
Domestic Violence Allegations made, medical records, social / legal interventions, victim support.
Medical GP records GP’s notes, prescriptions, tests, referrals.
Hospital records Operations, hospital stays, emergency care.
Medical conditions Diagnoses, diseases, allergies, blood type.
Mental health PTSD, breakdowns, depression, sectioning.
Addictions Substance abuse, gambling, rehab, crime.
Leisure1 Library Usage Books/CDs borrowed, computer access.
Sports & Health Gym usage, class attendance.
Shopping Habits Loyalty cards, store & online purchases.
Transport Data Buses used, ANPR tracking, walking patterns.

ARI4.2 Sentence Ranking - List of Sentences and Analysis Approach

In this section, additional details are provided on the Sentence Ranking exercise referenced in 4.2.6.

The sentences offered to participants across the 4 workshops were as follows:

Where participants unanimously or mainly disagreed with a sentence, it is referenced in the inverse using a prime notation, e.g. S18', which would imply a reference to the opposite of the statement - in this case ‘Public sector officials can not make good judgements just by looking at families’ data.’

In each of the workshops, families ranked the sentences according to:

This produced numerical ranking data which was analysed as follows:

  1. Sentence rankings were encoded on two scales. Sentences which contained a negative statement were inverted so that disagreement with them could be considered as agreement with a positive statement.

    1. Agreement: neutral (0) -> agree (+1.0)
    2. Importance: not important (0.0) -> important (+1.0)
  2. Rankings from different groups within workshops were aggregated, using mean averaging, with a weighting to ensure each workshop contributes equally regardless of attendance.

  3. This gave four values for each sentence, for each participant group (families only, staff only, and combined). Variance can be understood as ‘unanimity of opinion’: i.e. variance 0.0 indicates total agreement and 1.0 would indicate disagreement.

    1. Mean agreement
    2. Variance of agreement
    3. Mean importance
    4. Variance of importance.
  4. Prioritising variance in agreement over variance of importance, the four dimensions were reduced to three to allow a visualisation to be produced.

The resulting visualisation is shown in Figure 4.1.

ARI4.3 Storyboarding Action Cards

Drawing from the world of film production, storyboarding is a well-established technique in participatory design (Spinuzzi, 2005; Moraveji et al., 2007). Usually it involves the participants drawing out a series of sketches in the form of a comic strip ‘telling the story’ of an interaction, encounter or activity. However, it had already been determined, both in terms of the research approach of this thesis [3.2.2], and in terms of responding to participants [4.2.6] that it would be more important to understand the interpersonal interactions between family and support worker and the actual actions performed upon or with data, rather than the mechanisms by which the data interaction would occur. Focusing on the visual aspects of information visualisation could be distracting. Therefore, I developed a novel technique for use in the phase 2 workshop: Storyboarding Action Cards. Each storyboard card denotes a possible action that can be carried out by a family member (yellow border), support worker (blue border) or an action performed together (green border). Each card includes a simple action summary such as ‘Give Information’ and an iconographic representation of the action, along with a short description of which actor is doing what. It includes blank lines which the participant can ‘fill in’ to describe the specifics of this occurrence of the action.

Figure ARI4.1: Extract of Sample Scenario Storyboarding Exercise walkthrough

Based on the accumulated knowledge of Early Help processes amongst myself and SILVER colleagues, enhanced for this purpose through consultation with a former social worker, I developed a total of 43 different cards to represent the suite of possible actions that would be interesting to track. These are grouped into eight different types of card:

The intent behind the storyboarding action cards is that they serve as both a boundary object and things to think with (as with the Family Data Cards described in (Bowyer et al., 2018)) to provoke discussion among participants. They have an additional function over the Family Data Cards, however: they can be arranged in a sequence, much like a storyboard or comic strip, and filled in, to tell the story of exactly who would do what and how in the process of a support conversation involving shared data interaction. In this way they lend themselves to model processes rather than object design. Figure ARI4.1 shows an example of three cards having been filled in and arranged in sequence to tell a simple story of how a scenario of a worker seeking out an address following new information from the family member.

Figure ARI4.2: Example Backing Mat for Storyboard Decks (pictured here: backing mat for all three ‘problem’ card decks

In addition to the storyboard cards, I also designed ‘backing mats’ for each of the eight card types. These were printed on large coloured card corresponding to each card type’s backing colour, and provided areas for the ‘decks’ of available cards to be picked from. Each backing mat provided a separate home for family member actions, staff actions, and joint actions. Additionally, each backing mat included a summary of the available action cards of this type, and a prompt question. An example of a backing mat, in this case for Problem Cards, is shown in Figure ARI4.2.

Introduction and Practice

In order to familiarise the participants with the storyboarding action cards and the available actions, participants were first presented with an introduction to the storyboarding concept, as used in film-making and participatory design, then the card design and intended usage was explained.A very simple scenario of a family going through a breakup was used to talk through an illustrated example of how to map out the subsequent worker/parent conversation using the action cards. Then participants were invited to use the same scenario and practice mapping out the scenario themselves; however, this time they were to map out a ‘problematic’ version of the scenario, where things do not go so smoothly.

Scenario-Based Storyboarding Discussions

After the participants were acquainted with the cards and had practiced the storyboarding method, the main activity began, to which was allocated the majority of the time in the session. This involved each group mapping out two stories for a more substantial scenario; one version where things go smoothly and another ‘negative’ version where things do not go smoothly. It was highlighted to participants that the aim was to identify what would or should happen at each stage, and why.

The scenarios used for this activity by the two groups were (a) a new scenario where a couple is looking at their historical medical records (which contain various matters of concern such as missed appointments and historical mental health issues) and (b) a ‘labels and judgement’ scenario that had been used in the phase 1 workshops. Additional scenarios were prepared but not used. The layouts of the completed storyboards were photographed for reference, and to provide context during analysis of the discussion transcripts.

For a completed storyboard layout example, see Figure 3.10.

ARI4.4 Notation for Quotations in Chapter 4

Quotations included in section 4.3 are references using the following notation:

The number after FQ/CQ/SQ provides a unique identifier for each quote. Individual speakers are identified only by their role. Within each quote, or in brackets afterwards, the speakers are identified as Worker, Parent, Child, or Researcher.

Most quotes and conversation extracts are directly embedded into section 4.3. All other quotes referenced in the text (excluded for reasons of space and flow) are included in ARI4.5.

ARI4.5 Additional Participant Quotations

The majority of quotations and conversation extracts in Case Study Two are embedded inline throughout section 4.3. The following quotes were referenced in the text but excluded for reasons of space and flow. The following list also includes some quotes or extracts which were abridged in the Chapter body but are included in full here.

Quotes from Families-Only Workshop [A]

FQ1 [Researcher(A), Parent(B) & Daughter(C)] A: “So [you think that she should be able to be] selective about the things she wants her worker to know and leave out things that she doesn’t?” B: “Yes, like only her mental health and what tablets she’s on and things.” C, talking to B: “It sounds like you.” […] B: “If she trusted her worker, I think she’d tell her herself though.” A: “Do you think that makes a big difference?” B: “I had a worker and my daughter didn’t like her and it made it really difficult when she came out. But she likes the new one.” C: “I don’t.” B: “Why?” C: “She’s annoying.” A: “So do you think the relationship makes a difference to how much you tell?” C: “Yes. Because if you don’t like them, why should you tell them?”

FQ3 [Researcher (A) & Parent (B)] A: “What do you think could be done? What would help [this family] feel a bit happier?” B: “Give them a one-to-one support worker who they can build up a trust and understanding where you feel like they’re not going to share your information. I don’t know, maybe come up with a computer thing so you [the family] know what they’re [the workers] putting in or maybe sign paperwork [to give your approval].””

FQ6 [Parents] A: “It’s so hard because we’ve all done things in our past […]” B: “I think for him to see [old medical records] the doctor should have requested it, it shouldn’t just be there for him to see. I don’t know, if he was going for some mental health problems or something and then [he can] look back… […] It should be like you have to request to look at that data. I know when I’ve been to the doctors and they actually go into a different part of the system to find my old records, which I think is a bit bad. It shouldn’t just be there.”

FQ9 [Child] “I [designed] a graph to show how you are feeling day by day.”

FQ11A [Parent] [discussing the sentence ‘Numerical scores are a good way to judge a family’s progress’] “No, I disagree, because just anybody can tick any numbers. You could have a good day, you could have a bad day.”

FQ11B [Researcher (A) & Parents (B, C, D)] A: “Do people have a right to know [past incidents with police]?” B: “Not really. The past is the past, isn’t it?” C: “No, because…” D: “You shouldn’t be judged on your past, but I think it should be there [accessible in the data] because I think at the end of the day, you can fall back into old ways. The thing is, if you’re putting a child at risk or a person at risk, I think you [the worker] need to know everything, don’t you?”

FQ12 [Parent] “[The parent could] countersign. [The worker would] say, ‘I feel that we’ve talked about this today so I’m going to write that down. I’m going to show you. Can you sign and me sign if you’re happy and I’m going to share this.’ That’s a bit different [better]”.

FQ15 [Parent] “You would think that it would help with your benefits, [that] you wouldn’t mind sharing your data, would you, because they [support workers] are trying to help you. It’s not like they’re saying, ‘Well she gets too much money,’ They’re not trying to cut [families’] benefits, they’re trying to help [families].”

FQ16 [Parent] “[Families need to] feel they’re being involved. […] [We need to be able to] sit together and say, ‘Right, that’s the information I’ll allow you to share. I don’t want that bit shared. But this bit, because it will help me and the family […]’. Say in this [scenario] family, she might have been married before and had domestic violence so she doesn’t want that bit shared, that’s in the past. So it’s [only] certain up-to-date information about the family [that would be shared] because this [the family suggested by the data] isn’t her family.” [Parent, SQ76]

FQ17 (Worker (A) & Parents (B, C)) A: “‘Families don’t want to be responsible for looking after their data’”?  B: “It’s one of those things where …” C: “You’ve enough on without all that.” A: “You just don’t think about it.” C: “And if you were to think about it, would you actually do anything?”

Quotes from Staff-Only Workshop [B]

SQ3 [Workers] A: “I think we would have to see all the data.” […] B: “If you’re going out to visit a family, you don’t know what you’re going to.” A: “It’s about protecting ourselves as well.” B: “Yes, we have to check for markers, potential violence, things like that.”

SQ4 [Worker] [imagining an interface that would allow workers to see missed appointments] “Often they can lie to you, can’t they, and say,”Well yes, I’ve been to the doctor. Yes, I’ve been to the dentist. Yes, I’ve done that and yes, I’ve done that. But then [with this] we’ve kind of got the proof.

SQ5 [Worker] “[a benefit of having family’s data is that] families don’t have to tell the tale over and over again […] they don’t want to have to keep verbally telling everybody.”

SQ6 [Worker] I had one [client] yesterday where she was nearly all fives [out of 5] because they’d made that much progress. I had to put that on. So, she saw that as a real positive […] She was like, “I don’t need your support on this, I don’t need your support on this.”

SQ14 [Worker] “Parents might not want certain information [shared] so it might not be on [the visible data records] anyway…”

SQ9 [Workers] A: Sometimes they might have been out and had a drink, had an argument but the police have been called and it’s recorded as domestic abuse. B: That’s what I’m saying about it [the “domestic violence” label] being overused. A: In isolation, it probably wouldn’t be classed as domestic abuse. It was just an argument.

SQ10 [Worker] I think we make a lot of assumptions on information that we get about families without actually talking to them to find out why.

SQ11 [Workers] A: “I think you should never make a judgement on data …, that data could be wrong.” B: “It takes individuality, working with that person as well, doesn’t it?”

SQ12 [Worker] “It all depends on what data they’ve got. You take that family I worked with, if there was nothing on there about the mental health, she just looked like a really, really poor parent when in fact she’s not. I think a lot of the professionals over the years have just thought that. So, I disagree [with that sentence].”

SQ13 [Researcher (A) & Workers (B, C)] A: “Was that fair and appropriate and is that accurate in terms of [what data has been viewed]?” B: “I think it would be fair… I think for me it’s fair if it’s current because…” C: “It can only be fair if it’s complete, [if] you’ve got all the information there.”

SQ15 [Worker] “They [families] don’t like people knowing what’s going on in their lives.”

SQ17 [Worker] “You often get [that] by the time they’ve got back from the doctors, it’s ten times worse than the conversation actually was and three other things were thrown in and then they started spiralling out of control thinking about ‘What has been said behind my back?’ sort of thing.”

SQ18 [Worker] “It hasn’t been explained property to this family that their information will be shared with other professionals. So, they’ve been left feeling really let down and probably quite angry about it. So, although that information does need to be shared, they [the support workers involved] ought to make the family properly aware that information will be shared.”

SQ20 [Worker] “A lot of the families we work with have got the fear that we’re still social workers or attached to social workers. So, they’re saying, ‘I’m not going to share with you or work with you.’ […] [They might] say,”You’re not social services are you? We’re not going to have the kids taken away?”

SQ23 [Workers] A: I think [the medical data we can access] has to be issue-specific. I think to be able to see somebody’s full medical history is not always relevant to why we’re working with them. B: I had a gran who had residency and the GP sent everything from when she was 15 [including the details of her lost pregnancies]. That wasn’t relevant to what they were doing at the time with the grandchildren and residency. It’s got to be relevant. […] A: Relevant to what you’re doing with the family. B: Yes, relevant with the priorities and the issues what’s affecting them.

SQ24 [Workers] A: “Yes.” [to the sentence ‘Families’ data should be private unless they say it can be shared.’] B: “Unless it’s safeguarding, obviously.” A: […] “It’s private, but I guess if there was a real significant need for us to know or somebody else to know that information for safeguarding…” B: “The law will overrule.”

SQ25 [Workers] A: “Imagine somebody doing that [checking all the different data sources] though, that would be a lot of work, wouldn’t it?” […] B: “But actually, that’s a really good idea to have it all in one place.”

SQ26 [Workers] A: “[In this imagined ideal system] you would press on ‘Mum’ and then get all the data.” B: “You’d get all the data, anything you want.” A: “Crime, financial, just the things that we get. Then everything for Dad.”

SQ30 [Worker] “I think for some parents it will be good for them to visually see it as well. […] So you’re able to give them almost a visual context rather than just talking at them. Different people take information in different ways, don’t they?”

SQ31 [Worker] “I guess the things with [tables of data] is that might just be like a number or a percentage… whereas [using a pie chart or graph] is actually giving some context.”

SQ32 [Workers] A: “A lot of the time they say, ‘I’m not going to get into any more trouble,’ [but with the ability to show them data] you can say,”But if you did, this could happen.” B: “If you get into more bother, you’re going to go straight back down to there [acts pointing at data]. Look where you are now. If you carry on you’re going to end up up there but if you go back, if you continue to smoke that weed and smash that phone box, you’re going to go straight back down to there.”

SQ34 [Workers] A: “[Our idea is] an app for checking that data, with graphs and charts.” B: “That would be amazing if we just sat down with them and handed them [a tablet] and said,”We’ve just updated [our records]. Can I just check the accuracy?”

SQ35 [Researcher(A) & Worker (B)] A: What do you think determines whether [families] do or don’t have an interest in [checking their data]? B: I think the experiences that they’ve had […] If it’s historical to say a safeguarding, [they’ll just think] ‘we know what the process is, we know how things are kept, we’re not going to be able to do anything about it.’ [Worker & Researcher, SQ35]

SQ38 [Worker] “Families don’t know [what] data was being collected anyway […] If they knew what data was being collected about them and why it was being collected about them, I think they would mind – but I think that regardless of the fact whether they can see it or not, a lot of families don’t know how to access it because it all comes in the small print.”

SQ39 [Worker] “Not many families ask to see the case notes, whether it’s a social worker or whether it’s a family partner, other members of the authority or any other services. So […] even if they’ve seen the data, [I’m not sure] whether they’d be confident with everything that’s been on it.”

SQ40 [Worker] “Some families will go, ‘Well you know that information because it’s all there somewhere.’ We’re like, ‘Yes, but we don’t want to trawl back to eight years ago.’ There’s reams and reams and reams of it [data].”

SQ41 [Worker] “The information that we hold […] you would verbalise this as well when you go to visit the family. But what we [imagine] is expanding that a little bit more so: explaining why we hold the information that we hold, the process of why we store data, the information that we’ve got.”

SQ42 [Worker] “A lot of […] families talk to us about data we’ve collected and not one family I’ve ever met has got an issue with that. We go to them and say, ‘We’re aware that you’ve got these issues going on,’ and it might be antisocial behaviour or school attendance, health or a domestic violence incident and they’ve never said,”How on earth have you got that information?”

SQ44 [Worker] “For me, there’s so much data that’s stored. For me, for a parent to understand that through a text or email but just in point form. […] The less written, the better for the parent. [What we need is] a small synopsis […] like a summary view.”

SQ45 [Workers] A: “You know when people do have difficulties in terms of reading, on the computer you [could] press the sound button and it can read it for you. […] like text to audio.” B: “[It needs to be in an] easily understandable format, taking into account the family’s needs.”

SQ46 [Workers] A: “[using a data interface to convey data to families] is quite verbal, isn’t it?” B: “It is. The way you use your words, the way you use your language […] [the] husband’s needs are completely different to what [the] wife’s are. Her levels are really low and your levels are really high. I think that’s about the way you use your words…” A: “It’s how you explain it.”

SQ47 [Workers] A: “In terms of children, [you would need to have] more pictures and it would [need to] be clearer. [… Let’s write down] ‘Using age appropriate information’.” B: “Yes […] so it [would be] tailored content for the individual, if the age is there it might be sensitive information.”

SQ48 [Workers] A: “[There should be] separate data for each member.” B: “So really, if you want to talk to the daughter, she’s not going to see the mum or dad’s data. If you’re talking to the dad, he’s not going to see…” A: “Unless they get permission. So you [could] have a tick box system at the start about who can see what…”

SQ51 [Worker] “[The families would have] a little app which they can log in to and read all their information - what’s recorded about themselves, they can read the consent policy, who we share the information with, who we have shared the information with. If they’re not happy — this would be a read-only app for them — if they’re not happy they can fire off an email to us and let us know what they disagree with or if they want their information taken down or their consent.”

SQ52 [Worker] “You’d just have a different page for each one of the priorities what we work with and all the information stored under there. So our key feature would be you’d be able to have individual family members log in. That would be to prevent the child seeing what mum and dad’s issues were and stuff like that if it wasn’t relevant. You’d be able to select what information is visible to other family members.”

SQ55 [Workers] A: “[It’d be good to have a way to] capture young person’s voice and conversations.” […] B: “Self-help buttons [would be good] so say if somebody is feeling depressed […] There is a lot of self-harm going on at the moment.”

SQ56 [Worker] “[our app design] would allow [families] to record audios and then the workers can then access those transcriptions. […] There’s no chat, it’s just about getting their worries, if they can’t sit and talk to you in a face to face, one on one conversation…”

SQ57 [Worker (A) & Researcher (B)] A: “There’s times when I’ve been totally stuck in terms of getting information from professionals, GP, CAMHS9, so I’ll say to the family,”I need this information, can you ring and get it?” B: “So the family point you in the right direction, so they fill in the gaps for you?” A: “Yes.”

SQ58 [Workers] A: “There’s loads of things where [families] make massive improvements, it’s just not recorded. [They might have] changed their diet or lifestyle. There are loads and loads of things…” B: “But it’s not recorded as data.”

SQ62 [Workers] A: ‘I would be inclined to agree because they can’t get away from it.’ B: ‘I think it depends on how you would pass it back, really.’ A: ‘Well, it would be useful in meetings to know that she’d suffered from domestic abuse.’ C: ‘Yes, I can see the benefits and the downsides, yes.’ B: ‘Yes, so, they can shake it off but it also gets in the way.’

SQ63 [Workers] A: “[reading sentence] ‘Asking families for consent to share data just once at the start is enough.’ This is what we do now but how many times, when things go wrong families say to you, ‘I didn’t consent to that, I didn’t. That’s not what you asked me at the beginning.’” […] I don’t know if there should be a regular…” B: “…like an update, because things change in their life.” […] A: “[Should] we then [have] reviews, every six weeks [or so …], say to them, ‘Well let’s just remind each other what share consent is for and about.’? […] Obviously it’s got to be regularly done because […] circumstances change.”

SQ64 [Workers] A: “[You would] click on the feed [an imagined feed of updates concerning the family] and it would bring up if they’ve been in trouble.” B: “Absolutely. This [would] definitely [be] your perspective of families.”

SQ65 [Workers] A: “We would get a report through to say…” B: “They’ve recorded something.” A: “Yes. Then I suppose we would follow it up […] face to face.”

SQ67 [Researcher (A) & Worker (B)] A: “So is the key point of this one, that the families have input, as well and agree on what is put on there?” B: “Yes, so, agree on it and then they can add their signature.”

SQ72 [Worker] “You will have parents who will say that they don’t want to share because they know the consequences. One of our families, the little one, she’s six, and there was a DV [= Domestic Violence] incident and her mum was like, ‘Don’t say anything at school.’”

SQ75 [Worker] “[This imagined data interface] would be accessible to both worker and family member so that we can be in sync but [would be] encouraging the family to take full accountability for their own responsibilities.”

SQ76 [Worker] “Let’s say dad was sexually abused when he was a child, I think that’s important that we know that because dad could have mental health problems now which would be a result and we didn’t know that and he didn’t want to speak about it.”

SQ77 [Researcher (A) & Workers (B,C)] A: “Was that fair and appropriate and is that accurate in terms of [what data has been viewed]?” B: “I think it would be fair… I think for me it’s fair if it’s current because…” C: “It can only be fair if it’s complete, you’ve got all the information there.”

SQ78 [Worker] “So maybe you’ve got groups of young people who are, I don’t know, there’s something going on maybe in [local park], you’ve got some antisocial behaviour and they might be putting on their things that they like to do it with their friends. Then we pull from that, actually you’ve got a group of these young people who are involved in this. Then from that you can have focus groups. So, I think [if] we all as family partners know that we’ve got groups of young people where they are hanging out together so instead of just being one worker, I might think,”Well actually, there’s so many people in my team have got these kids so we can have a focus group.”

Quotes from Combined Parents and Staff Workshop [C]

CQ1 [Worker (A) & Parents(B, C)] A: “I think most families wouldn’t think about [checking their data] until […] something happens and they go, ‘Hang on a minute, that’s not right.’” B: “Yes, ‘Where’ve you got that from?’” C: “Yes, yes.” A: “But I think, other than that, we tend to just trust that everything that has been put down is right, don’t we?” C: “Yes.”

CQ2 [Worker] “That happens a lot, doesn’t it? It does happen where information is shared and then somebody gets upset because they didn’t think that level of information would be made available, even though permission had been given at the start of the plan.”

CQ8 [Parents (A, D), Worker (B)& Researcher(C)] A: if you find [a criminal record for burglary], you’re looking and thinking, “God! She’s gone out and committed a bloody burglary.” B: Well, it could affect your employment chances if that comes back on your DBS. But I explored it and talk about it and she said, “Well, I don’t agree with that. That’s not what happened.” I mean, she did break in but she wasn’t stealing anyone else’s stuff, it was her own stuff. […] If there is breaking and entering and burglary, and no explanation of that, and no way for that person to give you an explanation … C: It’s just somebody’s version of what happened? B: Well, it is, isn’t it? D: Well, the Courts need to change what’s recorded because if you broke into a house and stole a telly, that would come to the top. Whereas, something like that, which is more or less trespassing. In the eyes of any decent solicitor, it’s trespassing, to get your own stuff but, technically, you’ve stolen your own stuff. That should be put on a scale of severity, of 1 to 5, in the circumstances. If you’re homeless and you break into an empty house, is that burglary? Is that worth three years in prison? You know what I mean? [Parents, Worker & Researcher, CQ8]

CQ11 [Parent (A) & Researcher (B)] A: “I would want to see what information is held about me but then there are people out there who aren’t very confident in being able to ask or if they can’t read, if they’ve got learning [difficulties]” B: “What should happen for those people then?” A: “They should be supported by whoever is around them to access it in some form or another.” B: “They need to have someone talk them through it, or something?” A: “Yes.”

CQ12 [Parent] “I think a lot of people would like to be able to [access their data]. I think the prospect of, if you want to see your medical records […] having to make an appointment and go up and sit down and read paper records [is not something people would choose, whereas] if they were able to access it, in their own time, at their own pace [that would work better]. I’d love to see what’s been written about me in my medical records, I think some of it could be quite interesting.”

CQ15 [Parent] “I think [whether support workers should be able to access mental health details] depends on how long ago it was. […] I went through a really, really rough patch […] nearly 20 years ago and I had a brief patch of about three weeks where I was really not controlling my depression and I self-harmed and made an absolute fool of myself, and I’m fine with that now but I wouldn’t want people, everybody, to know about that because I wouldn’t want people to jump to the conclusion — because they still do — that there’s something wrong and I’m going to do it again and things like that. Because people change, and situations change.”

CQ17 [Worker] “I think most families wouldn’t think about [looking at or checking their data] until […] something happens and they go, ‘Hang on a minute, that’s not right.’”

ARI5 Additional Reference Information for Chapter 5

ARI5.1 GDPR Data Analysis Approach

In this section, the methodology used for the analysis of data from Case Study Two is explained. The content of this appendix is identical to Appendix 3 in the Supplemental Materials of the CHI 2022 paper from this study (Bowyer, Holt, et al., 2022). Case Study Two was written first as a paper and then expanded to produce Chapter 5. While the paper was co-written, Chapter 5 was written entirely by Alex Bowyer.

All coding was carried out by Alex Bowyer and Jack Holt, who followed the following process over a nine-month period, comprising at least 200 person-hours:

  1. EXTRACTION AND ANALYSIS OF SEMI-QUANTITATIVE DATA: Identifying closed question (or brief) responses that might be processable quantitatively.
  2. TEXT FILE PROCESSING: Splitting, organising, anonymising and some cleaning of auto-transcribed and time-coded text files.
  3. CATEGORISATION INTO CSVs: Categorised extraction of timecoded text sections from text files into cells of 6-topic spreadsheet, then generation of CSV files for importing into Quirkos Cloud (Daniel Turner, 2014)
  4. INDUCTIVE CODING: Importing of CSVs into Quirkos Cloud and labelling by Participant, Company, and Topic. Inductive coding of source texts, ensuring good coverage per topic and per participant.
  5. REDUCTIVE CYCLES: Reductive cycles of merging, renaming and reorganising the codes hierarchy, resulting in 10 top-level codes with hierarchies of coded texts underneath them.
  6. THEME IDENTIFICATION & QUOTE EXTRACTION: Construction of 3 paper-focussed themes using Workflowy (Turitzin and Patel, 2010) and quote gathering using the organised codes hierarchy.

Some additional detail on the stages:

1. Semi-Quantitative Data Extraction & Analysis

Prior to beginning coding the data, responses to some key closed questions from the transcripts were combined with field notes, response emails from companies forwarded by participants, sketches and tables from Interview 1/2, data from the interview 2/3 spreadsheet cells, and other data collected, and used to populate a spreadsheet that featured summaries of those responses. For example, where participants had been asked to outline their hopes for the outcomes of their GDPR data requests, these responses were recorded on the spreadsheet to be used as a resource for summarising participant hopes in a manner that could be easily quantified and referred back to. In some cases this data was analysed within the spreadsheet to produce insights, graphs and percentages. Such data was later used to support and illustrate findings from the coding process. This spreadsheet also included important information relating to each participant’s GDPR process experience, such as the timeliness and completeness of their data returns, which could serve as a reference point when analysing the transcripts.

The semi-quantitative data areas captured or derived from captured data were:

2. Text File Processing (Splitting & Recombination)

The researchers then moved on to prepare for the fully qualitative analysis. All interview audio was auto-transcribed using Zoom and Google Recorder, and then the generated text files were cleaned. Cleaning consisted of listening to sections of audio where transcription seemed inaccurate and correcting the transcripts. Due to the volume of data this cleaning was not done for all texts, only where ambiguity or typos meant it was needed for accurate coding and for quotes. Some anonymisation of source texts was also carried out at this stage and later, with a particular focus on quotes included in the chapter. The researchers used this data preparation stage as an initial means of (re)familiarising with the dataset. With reference to the structured interview schedules, the initial 33 text transcripts were split up by participant, company and topic using the labelling scheme outlined in ‘Text File Labelling Strategy’ below.

At the end of this process, roughly 100 ‘pieces’ had been identified for each participant (slightly more for P11 whose interview 1 covered a broader scope and considerably less for P9 who only did interview 1).

3. Categorisation into CSVs

The pieces from stage 1 were then recombined, across all participants, into 233 source files. These 233 source files were then further grouped into 6 topics areas. (The aim of the analysis was to identify common opinions and ideas around different topics, not to explore individual participant journeys end-to-end). The six topic areas were:

  1. POWER – discussions and scoring around the power of data holding companies
  2. TRUST – discussions and scoring around participants’ subjective trust in data holding companies
  3. LIFE – life sketching and annotation discussions, and ‘digital life’ questioning
  4. HOPES & USES – discussions around motivations, expectations, goals and hopes, and imagined uses of data
  5. COMPANY-SPECIFIC – (repeated once per target company per participant) – all discussions around the data return from a particular company
  6. GENERAL – all non-company specific discussions not captured elsewhere

This produced too many files for import into Quirkos Cloud, so once organised by topic, these six groups of files were further combined into 11 General files and 46 Company-Specific, files (with Life and General going into the General files and everything else going into Company-Specific). This gave 57 organised CSV files ready for use in the first coding phase.

4. Inductive Coding

The majority of the analysis took place with the use of Quirkos Cloud (Daniel Turner, 2014), a computer-assisted qualitative data analysis software (CAQDAS) package that allows for collaborative analysis by more than one researcher. The 57 files from stage 3 were imported into Quirkos Cloud, with each having a unique number. The sources in Quirkos were labelled by Participant, Company and Topic for easy search and retrieval. The researchers then collaboratively coded sections of the interview transcripts to develop and ensure a consistent approach, based on established techniques (Huberman and Miles, 2002; Braun and Clarke, 2006). Codes were identified inductively and not according to a fixed or predetermined set. Once a baseline codeset and strategy had been established, they each coded sections of interviews in parallel, regularly regrouping to discuss generated codes and any new questions or challenges arising. At first, these codes were created in an unstructured/flat state with only occasional clustering on the Quirkos interface. Due to the volume of data, not every piece of every transcript was coded, however care was taken to ensure a representative sample of views from across the participant pool was included. These were clustered into loose code-topic areas, an example is shown in the following screenshot taken approximately 6 weeks into coding:

Figure ARI5.1: Screenshot from Quirkos During Coding Process

5. Reductive Cycles

As more codes were identified and structures and commonalities between them were formed, existing codes were merged or absorbed into one another and grouped together in small clusters. The researchers regularly met to discuss each other’s codes according to their context and occasionally amended wording or merged concepts that were labelled differently but semantically equivalent. All codes were checked and agreed between these two researchers. Over time, the codes were iteratively structured and restructured, creating top-level thematic clusters around different research questions that held multiple layers of related codes. These clusters were then summarised with a short sentence or paragraph of text, allowing summaries to be produced at different levels of hierarchy. These summaries were kept in the Description fields of codes in Quirkos and also in external structured text-based documents. These can be seen in the following screenshot, taken 5 months into coding:

Figure ARI5.2: Screenshot from Quirkos at End of Coding Process

The above-pictured structure of the coded corpus at the end of the Quirkos Cloud phase was as follows:

Total codes = 645.

6. Theme Identification & Quote Extraction

Having produced the structure above as a reduced representation of ‘what the codes say’ that the participants think, the researchers used outlining tool Workflowy (Turitzin and Patel, 2010) to develop the arguments and primary narrative of the chapter into a structured three-theme-based summary of the most important items from these findings. The code hierarchy was used as source material to populate the three key themes with illustrative quotes and observed findings. An example from later in this process (around 8-9 months since Stage 1 began) is shown in the screenshot below:

Figure ARI5.3: Screenshot from Workflowy During Theme Construction

The themes are broken down in detail in 5.4 and can be summarised as:

  1. Insufficient Transparency: Organisations appear evasive over data when responding to GDPR, leaving people “in the dark” even after making GDPR requests.
  2. Confusing Data: When presented with their data, people struggle to understand it and relate it to their lives and are not able to make use of it.
  3. Fragile Relationships: Companies’ data practices, and in particular their privacy policies and GDPR response handling, can be impactful to customer relationships, carrying a risk of damaging trust but also the potential to improve relations.

In all, the process from commencing data analysis to writing up thematic findings in the chapter took over 200 person-hours over a 9-month period from January to September 2020.

Text File Labelling Strategy used in Stage 2

In stage 2, text files were initially broken down into small pieces and labelled according to the following strategy:

Interview 1 (Sensitisation / Poster Display Chat)

Break into 5 parts:

Interview 1 (Main Sketch Interview)

Break down as follows:

Format: NN-pXX-iX-[Comp/Type/Uses/GDPR/Motv]-[company first three letters].txt

e.g. 01-p01-i1-Comp.txt or 02-p01-i1-Powr-Face.txt

Interview 2

Break down as follows:

Format: NN-pXX-iX-[….]-[company first three letters].txt

e.g. 01-p01-i2-Priv-Goog.txt

Interview 3

Break down as follows:

Format: NN-pXX-iX-[….]-[company first three letters].txt

e.g. 01-p01-i3-Cred-Indr.txt or 02-p01-i3-Genr-Wrap.txt

ARI5.2 Best and Worst Companies for GDPR Handling

The quality and coverage datapoints described in 5.3.3 also allowed insights about which service providers were strongest or weakest in each category, and overall, to be drawn. This was done by tallying the ‘Yes’ responses for each category and overall, then dividing by the number of times that provider was selected, to avoid inflating scores for popular companies. The outcome of this analysis is shown in Table ARI5.1. The companies that fared worst overall were those that did not return any data at all in response to a GDPR request (Sainsbury’s, Freeprints, Tyne Tunnels, LinkedIn, Huawei, Bumble, LNER). It should be noted that Sainsbury’s and Huawei did respond, claiming to hold no data for the requesting participant, though participants found this implausible, which indicates either a problem with compliance, explanation or trust. The other named companies here did not respond at all, despite at least two follow-up emails being sent to them, and despite in some cases having initially acknowledged and promised to satisfy the request.

Companies producing responses with good coverage and good quality included Niantic, Nectar and Sunderland AFC as well as to a lesser extent Natural Cycles, Revolut, Spotify, Tesco and Amazon. Facebook and Google fared well for the breadth of data returned (due in part to their download dashboards), though the quality of Google’s data was found lacking across multiple categories. Last.fm (owned by CBS) fared poorly overall due to poor category coverage, despite the limited data that it did return being of high quality.

Table: Table ARI5.1 - Best and Worst Data Holders for GDPR, according to Participants’ Judgementsa

ARI7 Additional Reference Information for Chapter 7

ARI7.1 BBC R&D’s Cornmarket Project

I took a three-month sabbatical from my PhD in the summer of 2020. I was remotely embedded within a full-time research internship at BBC R&D - the British Broadcasting Corporation (BBC)’s Research and Development (R&D) department (British Broadcasting Corporation, 1997), working with specialists, designers, researchers and developers on an exploratory research project codenamed Cornmarket. I continued this involvement as a part-time research consultant and critical friend for a further 5 months after the conclusion of the initial three-month placement.

As part of its Royal Charter, one of the BBC’s lesser known obligations is to maintain a centre of excellence for research and development in broadcasting and electronic media, and to this end it employs over 200 researchers in its R&D department looking at everything from AV engineering and production tools to new forms of media, virtual reality, digital wellbeing and human data interaction (British Broadcasting Corporation, 1997). The Cornmarket project, launched in 2019, is a BBC-internal human-data interaction research project which explores a possible role for the BBC as it moves beyond broadcast television, using its public service responsibility to guide citizens to a position of empowerment within today’s digital landscape - encompassing not just entertainment but health, finance and self-identity. Due to its unique funding from UK-wide TV licensing and its duties to not only entertain but to inform and educate the general public, the BBC is uniquely placed to take a more human-centred approach than commercial innovators in this space as it needs only to deliver value, not profit. The project is exploring the use of Solid (Berners-Lee, 2022) technology to build a working Personal Data Store (PDS) prototype [2.3.4] while also developing, iterating and trialling user interface designs and conducting participatory research interviews and activities all to explore what for a BBC PDS might take and what features its potential users might value.

The proposed BBC Cornmarket product, internally called My PDS, would allow people to populate a PDS with personal data from APIs and data downloads from a variety of services including BBC iPlayer, Netflix, All4, Spotify, Instagram, Strava, Apple Health, banks and finance companies, as well as social media companies such as Facebook, LinkedIn and Twitter, and then to use these combined data sources to create personal profiles for Health, Finance, Media (i.e. entertainment) and Core, within which various data insights, visualisations, capabilities would be delivered. One feature the work explores in depth as potentially valuable to users is the ability to include and exclude certain datapoints from the imported viewing history data in order to present a more accurate, curated view of oneself that could then be fed back to other applications such as BBC Sounds to give better content recommendations.

With a cross-disciplinary team of around 20 people including architects, developers, user experience designers, product designers, innovators, participatory researchers and marketers, and funding to outsource public engagement research to agencies, this project represents a significant player in the emerging personal data economy [2.3.4]. As such the Cornmarket project is a fertile ground in which to learn more from practitioners in the PDE space and to test the learnings of this thesis in practice while also finding deeper insights in response to my research questions - in particular RQ3 which is concerned with the building of more human-centric personal data interfaces in practice.

Much of the work I did during this extended internship can be seen in the designs within 9.3, as well as the research report I wrote (Bowyer, 2020a) and internship writeup (Bowyer, 2020b). My work with the Cornmarket project can be seen as the concluding part of one of several action research cycles within my PhD [3.2.2].

An additional Figure from my time on Cornmarket that was not featured in the main body of the thesis is shown in Figure 7.1 below. This shows a screenshot from a functional prototype tool I produced during a hack week that allows the user to upload data retreived via GDPR or download portal, and proved the concept of programmatically identifying key entities 9.3.3 and identifying time-labelled events for display as life information to users.

Figure ARI7.1: Prototype Entity Extractor and Time-Event Extractor

A number of articles relating to the Cornmarket project have been published:

ARI7.2 Hestia.ai, and Sitra’s digipower Project

Following the conclusion of the funded period of my PhD, I took up a near-full-time position as Project Leader and Personal Data Coach at Hestia.ai (Dehaye, 2019)), a startup based in Geneva, Switzerland. Hestia.ai is a company conducting research, developing technologies, and delivering training, in the emergent MyData/PDE space [2.3.4]. In essence, the company’s mission is to help individuals and especially collectives to more easily obtain and understand data held about them, and to help them visualise, aggregate and make use of that data. It is an example of a data access and understanding services company as described in 9.5.3.

I was specifically hired to co-lead the digipower project (Härkönen and Vänskä, 2021), for Hestia.ai’s client, Sitra (Sitra, 1967). Sitra is a non-profit organisation in Finland, funded by the Finnish Parliament and accountable to the Finnish people. The goal of the digipower project was to guide 15 European politicians, civil servants and journalists, through the process of obtaining and exploring their own data. The participants were high-profile VIPs, including the former Prime Minister of Finland and former European Commission Vice President, Jyrki Katainen. The goal was to empower those individuals to better understand the workings of the data economy, so that they might be able to influence others and effect change. One of Sitra’s goals is to establish a fairer data economy (Sitra, 2018). Methodologically, the project drew heavily on my own Case Study Two [Chapter 5], adopting a similar method of guiding individuals through the process of making GDPR requests and scrutinising the returned data; I was employed on the project for this expertise. Where it differs from my own Case Study is that the focus of the research was outward, on the data economy and the practices of service providers, rather than inward, on the lived experience of the participants. Other differences included the building and use of software interfaces to provide participants with data visualisations, the use of TrackerControl software to audit mobile phone apps [Insight 12], and the direct analysis of participants’ retrieved personal data by the Hestia.ai research team (whereas my Case Study explicitly avoided handling participants’ personal data). The project resulted in three reports:

At the time of publication of this thesis (August 2022), I continue to be employed by Hestia.ai, working on the research, design and development of tools to help collectives [Insight 10] with data, make data easier to understand [6.1.2; 7.7], and exploring methods to help people ‘hack the seams’ of digital platforms and services [9.4].

Where the BBC internship has helped me to understand the practicalities of connecting people with their personal data in pursuit of Life Information Utilisation [7.6], my work with Hestia.ai has helped me understand the practicalities of how people might acquire greater Personal Data Ecosystem Control [7.6]. In this sense, both peripheral activities have been highly complementary to developing an overview of the pursuit of HDR in practice.

ARI7.3 DERC’s Healthy Eating Web Augmentation Project

As a software developer I have been aware for a long time that one of the biggest challenges in building new data interfaces is to gain programmatic access to the necessary data. As part of the trend towards cloud-based services and data-centric business practices, it has become increasingly difficult to access all of the data held about users by service providers. Application Programming Interfaces (APIs) are a technical means for programmers to access a user’s data so that third-party applications may be built using that data. Unfortunately, as a result of commercial incentives to lock users in and keep data trapped (Abiteboul, André and Kaplan, 2015; Bowyer, 2018), much of users’ data can no longer be accessed via APIs [8.4]. While GDPR data portability requests do open up a new option for the use of one’s provider-collected data in third-party applications, this is an awkward and time-consuming route for both users and developers. Web augmentation provides a third possible technical avenue for obtaining data from online service providers. It relies on the fact that a user’s data is loaded to the user’s local machine and displayed within their web browser every time a website is used, and therefore it is possible to extract that data from the browser using a browser extension; this as another seam that can be hacked—see 9.4 and Insight 12. Similarly, once loaded into the browser, a provider’s webpage can be modified to display additional data or useful human-centric functionality that the provider failed to provide.

Figure ARI7.2: Screenshot from a Web-Augmented version of the Just Eat Website, showing hygiene information and offering additional sorting

In order to better understand what is and is not possible using this technique, I participated part-time from 2018 to 2020 as the sole software engineer in a DERC (Digital Economy Research Centre) project. This project was using the web augmentation technique to explore how researchers could improve the information given to users of Just Eat, a takeaway food ordering platform in the UK. Hygiene Rating information for each outlet was added, as well as a feature to enable user to sort by hygiene rating, as shown in Figure ARI7.1. The theoretical basis for this research was published in (Goffe et al., 2021, 2022). While this particular use case does not concern personal data, the technology and techniques being used by the project to exploit the browser seam were considered highly relevant to the exploration of HDR-improving possibilities, and the goals of the research project were also human-centric, and consistent with this thesis’s research goals - tackling the hegemony of service providers in order to better serve individual needs.

ARI7.4 Special Attribution Note for Part Two

This is a note about the attribution of insights within Chapters 7, 8 and 9, as the ideas originate quite differently than from the rest of the thesis.

This thesis is my own work. All ideas synthesised in Part Two are original. Some of the specific details, theories and ideas presented in Part Two arose or were developed or augmented through my close collaboration, discussion and ideation with other researchers both alongside and prior to the PhD timeframe, including:

Due to these collaborations and the ongoing and parallel nature of many of these projects to my PhD research, it is impossible to precisely delineate the origin of each idea or insight. In practice, ideas from my developing thesis and own thinking informed the projects’ trajectories and thinking, and vice-versa. These ideas would not have emerged in this form without my participation, so they are not the sole intellectual property of others, but equally I would not have reached the same conclusions alone, so the ideas are not solely my own either. All diagrams and illustrations were produced by me, except where specified, and the overall synthesis and framing presented in this chapter is my own original work. Where this chapter includes material from the four peripheral projects [7.2], that material is either already public, or permission has been obtained from the corresponding individuals or project teams.

ARI7.5 Eight Lenses on Personal Data

This table is referenced and contextualised in section 7.4.

Table ARI7.1 - Eight Lenses on Personal Data.
Way of thinking about data Explanation & Implications
Data as property Data can be considered as a possession. This highlights issues of ownership, responsibility, liability and theft.
Data as a source of information about you Knowing that data contains encoded assertions about you and can be used to derive further conjectures enables thinking about how it might be exploited by others, but also how you can explore and use it yourself for reflection, asking questions, self-improvement and planning. It invites consideration of the right to access, data protection, and issues around accuracy, fairness and misinterpretation / misuse.
Data as part of oneself A photo or recording of you, or a typed note or search that popped into your head could be deeply personal. This lens on data highlights issues around emotional attachment/impact, privacy, and ethics.
Data as memory Data can be considered as an augmentation to one’s memory, a digital record of your life. This lens facilitates design thinking around search and recall, browsing, summarising, cognitive offloading, significance/relevance, and the personal value of data.
Data as creative work Some of the data we produce (e.g. writing, videos, images) can be considered as an artistic creation. This lens enables thinking about attribution, derivation, copying, legacy and cultural value to others.
Data as new information about the world Data created by others can inform us about previously unknown occurrences in our immediate digital life or the wider world. This lens is useful for thinking about discovery, recommendations, bias, censorship, filter bubbles, and who controls the information sources we use, as well as who will see and interpret data that we generate and what effects our data has on others.
Data as currency Many data-centric services require data to be sacrificed in exchange for access to functionality, and some businesses now explicitly enable you to sell your own data. This lens highlights that data can be thought of as a tradable asset, and invites consideration of issues of data’s worth, individual privacy, exploitation and loss of control.
Data as a medium for thinking, communicating and expression Some people collect and organise data into curated collections, or use it to convey facts and ideas, to persuade or to evoke an emotional impact. This lens is useful to consider data uses such as lists, annotation, curation, editing, remixing, visualisation and producing different views of data for different audiences.

ARI9.1 Additional Reference Information for Chapter 9

ARI9.1 How I compelled Spotify to improve their GDPR return

In this section, I will provide additional details of my mini-case study where I was able to get Spotify to improve the quality of their GDPR returns, as referenced in Insight 9 within section 9.2.1.

As an avid user for several years of the music streaming service Spotify who has built up a large library of playlists, I have made a number of GDPR requests to get copies of my personal data.

When I was first given a copy of my personal data, I was returned a basic ZIP file including 12 JSON files containing playlists, search queries, account information, my last 12 months of track play history, and inferences about my musical tastes. Spotify also make an extended data download available, including technical log data, and extended play history (which covers the lifetime of my account).

I requested this extended download and received a much larger dataset with 175 JSON files, including granular details of when I had used different interface features and the precise details of every song I had ever played.

Thinking that I would like to use this data to build a view of my listening history that was not tied to the Spotify platform (in line with the idea of increasing agency by separating one’s data from the service that holds it [Chapter 8]), I examined the streaming history and playlist data with this purpose in mind. What I found was that individual songs were identified only by textual strings of the title, artist and album name. This information is insufficient for a programmer’s use - there is no unique identifier or Uniform Resource Indicator (URI) to uniquely identify the specific version and release of a track played. Also without such an identifier, it would not be possible to generate a thumbnail image of the track, or build functionality such as a clickable link to ‘play this track in Spotify’.

This highlights a common issue that occurs with data access requests, as highlighted in 5.4.3 - there is ambiguity over whether providers should identify data in a machine-readable way (useful for programming), or in a human-readable way (to optimise understanding). In my case, I needed both. I e-mailed Spotify back and was provided with an alternative file set which contained only Spotify Track URIs, such as spotify:track:4cOdK2wGLETKBW3PvgPWqT. These met the programmer need to uniquely identify the track, but not the human need–I had no idea which artist or track each of these URIs corresponded to, as there was no human-readable text accompanying each entry.

So, I e-mailed Spotify back, making the case that my GDPR rights had not been fully satisfied, because I needed for each play history entry, both machine-readable ID and human-readable track title and artist name.

I sent Spotify over 30 e-mails on this matter between October 2020 and May 2021. There is little continuity of conversation between support agents, and it was hard to be escalated to the correct staff with the technical or legal expertise to assist with such nuanced questions.

However, by persistently and politely repeating my questions and not accepting ‘No’ for an answer, I was able to achieve a notable outcome, Spotify changed the format of their data returns, not just for me but for all future customers. Now, each item in the playback history data you get back from Spotify, every item includes textual track and artist details AND a Spotify track URI. The data can now be understood by both human and machine.

The likely interpretation here is that I successfully able to persuade their Data Protection Officers (who handle GDPR requests) the importance of returning data that is both machine-readable and human-understandable. Perhaps they also recognised the amount of work they had invested in supporting my query, and wanted to avoid having to do such work ever again should I or any other customer make the same request in future. This was a tiny impact, but a lasting one, and it shows that the discovery-driven activism / civic hacking approach [9.2] can have an effect in improving HDR with a target organisation.


Bibliography

Abiteboul, S., André, B. and Kaplan, D. (2015) Managing your digital life with a Personal information management system. 5. ACM, pp. 32–35. doi: 10.1145/2670528.
Berners-Lee, T. (2022) ‘Solid: Sir tim berners-lee’s vision of a vibrant web for all’. Inrupt. available at: https://inrupt.com/solid/.
Bowyer, A. (2018) Free Data Interfaces: Taking Human- Data Interaction to the Next Level, CHI Workshops 2018. available at: https://eprints.ncl.ac.uk/273825.
Bowyer, A. et al. (2018) Understanding the Family Perspective on the Storage, Sharing and Handling of Family Civic Data, in Conference on human factors in computing systems - proceedings. New York, New York, USA: ACM Press, pp. 1–13. doi: 10.1145/3173574.3173710.
Bowyer, A. (2020a) ‘Design research for cornmarket PDS, recommender & associated permissions: Report by alex bowyer (BBC research intern/open lab PhD)’. available at: https://bit.ly/bbc-pds-research-bowyer.
Bowyer, A. (2020b) ‘Designing personal data interfaces - a multi-disciplinary challenge’. available at: https://bit.ly/bbc-internship-alex-bowyer (accessed: 18 August 2022).
Bowyer, A., Pidoux, J., et al. (2022) Digipower technical reports: Auditing the data economy through personal data access. doi: 10.5281/zenodo.6554177.
Bowyer, A., Holt, J., et al. (2022) ‘Human-GDPR interaction : Practical experiences of accessing personal data’, CHI ’22.
Braun, V. and Clarke, V. (2006) Using thematic analysis in psychology, Qualitative Research in Psychology. Taylor & Francis, 3(2), pp. 77–101. doi: 10.1191/1478088706qp063oa.
British Broadcasting Corporation (1997) ‘Our purpose’. available at: https://www.bbc.co.uk/rd/about/our-purpose (accessed: 18 August 2022).
Daniel Turner (2014) ‘Quirkos cloud’. available at: https://www.quirkos.com/learn-qualitative/features.html.
Dehaye, P.-O. (2019) ‘Hestia.ai: About us’. available at: https://hestia.ai/en/about/.
Goffe, L. et al. (2021) ‘Appetite for disruption: Designing human-centred augmentations to an online food ordering platform’, in 34th british HCI conference, pp. 155–167.
Goffe, L. et al. (2022) ‘Web augmentation for well-being: The human-centred design of a takeaway food ordering digital platform’, Interacting with Computers.
Goodwins, R. (2021) ‘Sir tim berners-lee and the BBC stage a very british coup to rescue our data from facebook and friends’, The Register. available at: https://www.theregister.com/2021/10/04/column_data_privacy/ (accessed: 25 August 2022).
Härkönen, T. et al. (2022) Tracking digipower: How data can be used for influencing decision-makers and steering the world. Sitra. available at: https://www.sitra.fi/en/publications/tracking-digipower/.
Härkönen, T. and Vänskä, R. (2021). Sitra. available at: https://www.sitra.fi/en/projects/digipower-investigation/#what-is-it-about.
Hoofnagle, C. J., Sloot, B. van der and Borgesius, F. Z. (2019) The European Union general data protection regulation: What it is and what it means, Information and Communications Technology Law. Taylor & Francis, 28(1), pp. 65–98. doi: 10.1080/13600834.2019.1573501.
Huberman, M. and Miles, M. B. (2002) The qualitative researcher’s companion. Sage.
Information Commissioner’s Office (2014) Data controllers and data processors: what the difference is and what the governance implications are, p. 20. available at: https://ico.org.uk/for-organisations/guide-to-data-protection/introduction-to-data-protection/some-basic-concepts/.
Kanter, J. (2021) ‘BBC and sir tim berners-lee app mines netflix data to find shows viewers like’, The Times. available at: https://www.thetimes.co.uk/article/bbc-and-sir-tim-berners-lee-app-mines-netflix-data-to-find-shows-viewers-like-lxp002gg8 (accessed: 25 August 2022).
Moraveji, N. et al. (2007) ‘Comicboarding: Using comics as proxies for participatory design with children’, in Conference on Human Factors in Computing Systems - Proceedings. ACM, pp. 1371–1374. doi: 10.1145/1240624.1240832.
Orphanides, K. G. (2021) ‘The BBC’s radical new data plan takes aim at netflix’, Wired UK. available at: https://www.wired.co.uk/article/bbc-data-personalisation.
Pidoux, J. et al. (2022) Digipower technical reports: Understanding influence and power in the data economy. doi: 10.5281/zenodo.6554155.
Ricklefs, H. et al. (2021) ‘Stronger together: Cross service media recommendations’, International Broadcasting Convention. available at: https://www.ibc.org/download?ac=18659 (accessed: 25 August 2022).
Sharp, E. (2021) ‘Personal data stores: Building and trialling trusted data services - BBC r&d’, BBC R&D Blog. available at: https://www.bbc.co.uk/rd/blog/2021-09-personal-data-store-research.
Sharp, E. and Bowyer, A. (2022) ‘Building trusted data services and capabilities’. available at: https://paper.dropbox.com/doc/Building-trusted-data-services-and-capabilities-Us49Ek0nex7yClKughPN4 (accessed: 18 August 2022).
Sitra (1967). available at: https://www.sitra.fi/en/topics/strategy-2/#what-is-sitra (accessed: 18 August 2022).
Sitra (2018) ‘Sitra’s fair data economy theme: What is it about?’ available at: https://www.sitra.fi/en/themes/fair-data-economy/#what-is-it-about (accessed: 18 August 2022).
Spinuzzi, C. (2005) ‘The methodology of participatory design’, Technical Communication. Society for Technical Communication, 52, pp. 163–174.
The European Parliament and the Council of the European Union (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, pp. 16–32. available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32016R0679 https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679&from=ES.
Tim Davie (BBC Director-General), Richard Sharp (BBC Chairman) and Clare Sumner (Director of Policy) (2022) ‘House of lords communications and digital select committee’, parliamentlive.tv. available at: https://parliamentlive.tv/event/index/7d249bcf-78e9-447b-907c-81df72b87542?in=15:01:35 (accessed: 25 August 2022).
Turitzin, M. and Patel, J. (2010) ‘Workflowy’. available at: https://www.workflowy.com/features/.
Woods, B. (2022) ‘BBC wages war on online echo chambers with “unbiased” tech’, The Telegraph. available at: https://www.telegraph.co.uk/business/2022/06/09/bbc-wages-war-online-echo-chambers-unbiased-tech/ (accessed: 25 August 2022).

  1. Some leisure categories (namely Shopping and Transport) were included that are not strictly civic data, as these would be useful for exploring issues around ethics. These also provided a reference point for participants to better consider the ‘big data’ benefits of data linking.↩︎